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Abstract. We investigate here the optimal transportation problem on 
configuration space for the quadratic cost. It is shown that, as usual, 
provided that the corresponding Wasserstein is finite, there exists one 
unique optimal measure and that this measure is supported by the graph 
of the derivative (in the sense of the Malliavin calculus) of a "concave" 
(in a sense to be defined below) function. For finite point processes, we 
give a necessary and sufficient condition for the Wasserstein distance to 
be finite. 



1, Introduction 

The optimal transportation problem stems back to the eighteenth century 
when G. Monge addressed the optimal way to move earth particles from 
one location to another. It is only in the forties of the last century that 
Kantorovitch gave this problem its modern form and a complete solution. 
According to the Kantorovitch approach, the optimal transportation problem 
or Monge-Kantorovitch problem (MKP for short) reads as follows: given two 
probability measures fi and v on a Polish space X and a cost function c on 
X x X, does there exist a probability measure 7 on X x X which minimizes 
Jcd/3 among all probability measures (3 on X x X with first (respectively 
second) marginal [i (respectively v) ? One can furthermore ask whether the 
optimal measure is unique and which properties it has. So far, the mainly 
investigated situations suppose that X = R n or a finite- dimensional manifold 
with a cost function which is c(x,y) = h(\x — y\) where h is a convex (or 
concave) function on R. 

Varying cost functions and underlying spaces yields to numerous inter- 
esting inequalities often with optimal constants (see [Vil03] and references 
therein) or to new insights on known theorems such as Strassen Theorem 
about stochastic ordering (see [RR98a, RR98bJ). Moreover, in a large part 
of investigated cases, the optimal measure is unique and is supported by the 
graph of a function T, i.e., 7 = (Id®T)*fi. This map T gives raise to a cou- 
pling, said optimal, between the measure fi and v: If A is a r.v. distributed 
according to fi then T(A) is distributed according to v and this construction 
of the two distributions on the same probability space is optimal in the sense 
that it minimizes E[c(A, B)] among all the r.v. B distributed according to 
v. Optimal coupling is a well known tool to obtain inequalities between ran- 
dom variables (see [ThoCjj]]) but to the best of our knowledge, the optimal 
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coupling has to be explicitly built to obtain these bounds. Meanwhile, op- 
timal transportation theory in the solved cases, indicates that the optimal 
coupling (or transportation) map can be written as the graph of a "con- 
cave" function, so independently of the precise description of the map, one 
can obtain interesting inequalities just knowing this property of the optimal 
map. 

Our goal is here to develop the theory of optimal transportation for point 
processes (or configuration spaces) with the objective to obtain a machinery 
yielding inequalities similar to those obtained by the Stein's method [BHJ92, 

iRMn2ilHnrflmxnn| . 

The first step is to define a cost between configurations. Several possi- 
bilities can be envisioned, we chose here a cost with a strong physical in- 
terpretation: given two configurations (or sets of points), (xi, • • • , x m ) and 
(yi, ■ ■ ■ , Vm), the cost is roughly defined as inf (Tem £\ \xj - Ua{j)\ 2 where m 
is the group of permutations over {1, • • • , m} (see El for the precise defini- 
tion). The point is then to determine the cost to go from a configuration 
with m points to a configuration with mf points, when m 7^ m'. In order to 
keep a physical meaning to the definition of our cost, it seems sensitive to 
impose an infinite value to something which is impossible. The negative con- 
sequence of this choice is that severe constraints are imposed (see Theorem 
14. 2p on two finite point processes for their Wasserstein distance to be finite. 
On the other hand, the positive consequence is that the optimal measure 
has a well defined structure. These constraints disappear when we deal with 
locally finite but not finite point processes and we still have a rigid structure 
for the optimal measure. It turns out that proving here the uniqueness and 
describing the form of the optimal measure is highly similar to the proof of 
the same properties for the Wiener space (see |FIJfl4| ). 

This paper is organized as follows : we describe the Monge-Kantorovitch 
Problem in its general settings for a generic cost function on a product of 
two abstract Polish spaces, since we will need to instantiate these general 
results to different particular situations in the subsequent sections. Section^] 
is devoted to general properties of the Wasserstein distance on configuration 
spaces irrespective to the properties of finiteness of the considered point 
processes. In Section QJ we work under the assumption that only a finite 
number of atoms are random in the /Lt-configurations and we slightly modify 
our cost function to simultaneously solve optimal transportation between 
finite point processes and pave the way to the analysis for locally finite point 
processes. This latter analysis is done in Sectional 



For X and Y two Polish spaces, for \x (respectively u) a probability mea- 
sure on X (respectively Y), £(//, v) is the set of probability measures on 
X x Y whose first marginal is \i and second marginal is v. We also need 
to consider a lower semi continuous function c from X x Y to R + . The 
Monge-Kantorovitch problem associated to /x, v and c, denoted by MKP(/x, 
v, c) for short, consists in finding 
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inf / 

7S£(m, v) JxxY 
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More precisely, since X and Y are Polish and c is l.s.c, it is known from 
the general theory of optimal transportation, that there exists an optimal 
measure 7 £ v) and that the minimum coincides with 

sup ( / Fdn + [ Gdv), 

where (F, G) belongs whenever F G L 1 (d/i), G G L 1 (di/) and F(x)+G{y) < 
c(x, y). We will denote by T c (/i, v) the value of the infimum in (EQ). Solving 
the Monge-Kantorovitch problem on Polish spaces, is then essentially proving 
the flniteness of Q and the uniqueness of the optimal measure. For X = 
Y = TL k and c taken to be the square euclidean distance, we have the second 
moment condition: for T c (fi, v) to be finite it is sufficient that f ||x|| 2 d/i 
and J \\x\\ 2 du are finite, where \\x\\ is the euclidean norm (see |Vil03j ). For 
further reference, we denote by T e (where e stands for euclidean) this distance 
between probability measures on R fc . If (jl is the Gaussian measure on R fc and 
v is absolutely continuous with respect to \i with Radon-Nikodym density 
L, it is sufficient that L has a finite entropy, i.e., the ^-expectation of Lin L 
is finite, for T e (n, Lfj) to be finite. This criterion extends to the infinite 
dimensional setting where X = Y is a Wiener space and c(x, y ) = 2" 1 \\x — 
yWjj, where H is the associated Cameron-Martin space (see |FU04| ). In 
full generality, we know from [RR98aJ that if there exist F G L l (d^) and 
G € L l (du) such that c(x,y) < F(x) + G(y), then T c (fi, v) is finite. 

Once the finiteness of T c is ensured, it remains to know whether the op- 
timal measure is unique. For, it is essential to see that a measure 7 S 
S(/i, v), is optimal if and only if its support is c-cyclically monotone (see 
|Lev991 lRiis96p : for any ((xj, yi), i = 1, • • ■ , m) G (supp7) m , we have 

m m 

5^c(xi, yi) < ^2c{xi, y a(i) ), 
i=i i=i 

for any a G m , the group of permutations over {1, • • • , m}. Moreover, the 
support of any optimal measure is included in the c-super-gradient of a c- 
concave function: For F : I->R = RU {+00}, its c-super-gradient, d c F, 
is the subset of Tx x Ty of (x, y) such that c(x, y) < 00 and 

F(x) — F(z) > c(x, y) — c(z, y), for any z such that F(z) < +00. 

The section at x, d c F(x), is the set {y G Y, (x, y) G d c F}. A function 
F : X — ► R is called c-concave if there exist a set index /, (y,, i G I) a 
family of elements of Y and (ctj, i G I) a family of real numbers such that 

F(x) = inf (c(x, yi) + 04). 
iei 

If we prove that the c-super gradient of a c-concave function is single valued, 
we are done, i.e., we have proved the uniqueness of the optimal measure. 
Indeed, if d c F(x) is reduced to a singleton for fi-a.s. any x £ X, this means 
that d c F(x) is closed and the selection theorem then induces that there exists 
a measurable map T such that (x, y) belongs to d c F if and only if y = T{x). 
The un iqueness follows then from the following lemma which we borrow from 
|FU04|. 
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Theorem 2.1 (See [FU04| l. Let X and Y be two Polish spaces and c be a 
lower-semi-continuous function from X x Y to R + U {+00}. Consider the 
Monge-Kantorovitch problem associated to the marginals [i on X , v on Y 
andc. Assume that for any optimal measure 7, there exists a measurable map 
T 7 such that supp 7 C {(x, T 7 (x)), x G supp fj,}. Then, there exist a unique 
optimal measure 7 and a unique measurable map T such that 7 = (Id(g>T)*//. 

Proof. For any probability measure 7 on X x Y, we denote by J(-y) the 
integral of c with respect to 7: 



J( 7 ) = / c (x, y)dj(x, y). 

J XxY 

Assume that 71 and 72 are two different optimal measures. Since J is linear 
with respect to 7, 70 = (71 + 72V2 is also optimal. We denote by To a 
map whose graph contains the support of 70. Furthermore, for i = 1,2, ji 
is absolutely continuous with respect to 70. We denote by Lj the Radon- 
Nikodym derivative of ji with respect to 70. For any / G C&(X), we have 

f(x)dfi(x)= f(x)dji(x, y) 

X J XxY 

f(x)L i (x,y)d'yo(x, y) 

XxY 



f(x)Li(x, T (x))dn(x). 

>x 

Therefore, we must have Li(x, Tq(x)) = 1 /i-a.s. or in other words, Lj = 1 
70-a.s. for i = 1, 2. This means that 71 = 72 and then the uniqueness of the 
optimal measure for MKP(^, v, c). 

Assume now that there exist two maps T\ and Ti such that 7 = (Id (g>Tj)*/i. 
This implies that for any / € Cf,(X), foT\ = } '0T2 /i-a.s. hence that T\ = T2 
^-almost surely. □ 

The simplest way to prove that the c super-gradient of a c-concave function 
is single- valued is to show that a c-concave is "differentiable" in some sense. 
That is why, we need to introduce a notion of gradient on configuration space. 
The notations are mainly those of |AKR98j . Let Tx be the configuration 
space over a Polish space X, i.e., 

Fx = {rj C X; r\ n K is a finite set for every compact K C X}. 

We identify 77 G Tx and the positive Radon measure ^2 x€v £x- Throughout 
this paper, Tx is endowed with the vague topology, i.e., the weakest topology 
such that for all f £ Co (continuous with compact support on X), the maps 

are continuous. When / is the indicator function of a subset B, we will use 
the shorter notation rj(B) to denote the integral of 1b with respect to r\. We 
denote by BiTx) the corresponding Borel c-algebra. 

The intensity measure of a probability measure \i on Tx, is denoted by 
E^r] and defined by (E^r])(B) = E^[r](B)], for any B G B{Tx)- We assume 
henceforth that E^rj is a positive Radon measure on B(Tx)- 



WASSERSTEIN DISTANCE ON CONFIGURATION SPACE 



5 



In what follows, we will take X = R fc for some k > 1. Let V(X) be the 
set of C°° vector fields on X and Vq(X) C V(X), the subset consisting of all 
vector fields with compact support. For v G Vo(X), for any x G X, the curve 

t ^ V v t (x) G X 

is defined as the solution of the following Cauchy problem 

(2) l^ t vnx) = v WW), 

\v8(x) =x. 

The associated flow (V", t G R) induces a curve (V t u )*7? = 770 (V^) -1 , i G R, 
on T x : If V = E*©, ^ then ( y t)*V = T.x&, £ V-{x)- 

Hypothesis I. Throughout this paper, we assume that fi (or v) is a Borel 
probability measure on Tx such that the following conditions hold. 

i) 7]({x}) G {0, 1} for all x G X and fi-a.s. rj. 

ii) Either fi(rj : r](X) < +00) = 1 or fi(rj : r](X) = +00) = 1. 

Hi) For all v G Vq(X) and t G R, fj, is quasi-invariant with respect to the 
flow (V")* ofTx, i-e., /x o ((VjF)*)" 1 is equivalent to fi. 

We are then in position to define the notion of differentiability on Tx- 
A measurable function F : Tx — > R is said to be differentiable if for any 
v G Vq(X), the following limit exists: 

limt- 1 (Fmy V ) - F( V )) . 

We then denote V^F(r/) the preceding limit and by V^,F(t]) the correspond- 
ing gradient (see |AKR98| ) which is defined by the identity: 



V r x F(r 1 ).v(x)dr ] (x) = S/ r v F(r ] ), 



for all v G V (X). 



3. WASSERSTEIN DISTANCE 

We consider on X = R fc the cost function as d(x, y) = 2~ l \\x — y\\ 2 where 
1 1 a; 1 1 denote the euclidean norm of x G X and we define a cost between 
configurations (see also |BM02| iBXuTil IXiaOOj ) as the 'lifting' of d on T x : 

c(m,V2) = inf jy d(x,y)d/3(x,y), P <ET vltV2 X , 

where T Vl ^ 2 denotes the set of (3 G TxxX having marginals r/i and 772- 
According to [RS99J, c is lower semi continuous on Tx X Tx- We can then 
set the Monge-Kantorovitch problem for configuration spaces. 

Definition 1. Let /j and v be two probability measures on (Tx, B(Tx))- 
We say that a probability 7 on (Tx X Tx,B(Tx x Tx)) is a solution of the 
Monge-Kantorovitch Problem associated to the couple (/j, v) and to the cost 
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c (MKP(n, v,c) for short) if the first marginal of 7 is fi, the second one is v 
and if 

J(l) = J c(r),() d 7 (r), C) 

= inf|y c( V , df3( n , : /?€£(/*,!/)} 
Since Tx is Polish, this infimum is attained and is equal to 
sup{ J F( V )dv( V ) + J G(C)cMC) : (F, G) e <I> C }, 

where <& c is the set of pairs of measurable, real-valued functions F and G 
such that F (resp. G) belongs to L 1 (d/x) (resp. L l (dv)) and ^(77) + G(£) < 
c(ry, Q. The Wasserstein distance between fx and 1/ is the square root of 
v). 

Since the cost c is infinite whenever the two configurations do not have 
the same mass, we have the following theorem. 

Theorem 3.1. Let \i and v be two probability measures on the configuration 
space Tx ■ If the Monge-Kantorovitch cost, with respect to c, is finite then 

[i(r)(X) = n) = u(r](X) = n) for any n <G N U {+00}. 
Proof. There exists at least one measure 7 such that 

= / c(tj, 0^7(77,0;) + / c(r/, u) d<y(rj, u). 

Jri(X)=ui(X) Jr)(X)^u)(X) 

This implies that -y(rj(X) ^ u(X)) = 0. It follows that 

v(u(X) =n)= y(u(X) = n) 

= i(u(X) = n; n{X) =n) + y(u(X) = n; rj(X) ^ n) 

= 1 {u{X)=n- v(X)=n), 

for any n G N U {+00}. By the very same reasoning, it also holds that 
[x(r](X) = n) = 7(cj(X) = n; rj(X) = n) for any n and thus that fi(rj(X) = 
n) = v{r](X) = n) for any n € N U {+00}. □ 

4. Finite point processes 

Consider A a compact set of X and let £ be fixed in T\c We define cq as: 

c c : r A x T x -» R + U {+00} 
(ri,u) >-> c(r/ + C, u). 

Let ^ be a probability measure on Ta and v a probability measure on Tx, 
we denote by T Cl .(fJ,, v) the c^-Wasserstein distance between fi and v: 

T c (n,v)= inf / c c (r], uj)d-f{r], uj). 

Since A is compact and E^rj is supposed to be a Radon measure, the con- 
figurations of T\ have /x-a.s. a finite number of atoms. It it then useful to 
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(n) 

think of Ta as the disjoint union of the spaces T y A for n running from to 
infinity, where 

rf } = { v e r A , V (A) = n}. 

Then, consider A™ = {(xi, • • • , x n ) G A ra , Xj ^ Xj for i / j}, there is a 
bijection between A"/ n and T^: 

4 : A n /n — 



{si,-- - ,x n } i — ► XX* " 



i=i 

The topology of A n / n induced by the usual topology of A n thus defines a 
locally compact metrizable Hausdorff topology on Since A is compact, 

(n) 

this topology coincides with the restriction to T A of the vague topology on 

T\. We put on B(F^) the associated Borel a-algebra. For any map F 
from T\ into a measurable space (Y,y), for any integer n, we can consider, 

F n , the restriction of F n to T^: 

F n ■ - y 

77 i-> F n (r?) = F(t/). 

Since rj^ is closed in T\, it is a Polish space and F n is measurable from 
(if*, B{T™)) into (y, y). 

We now identify <r G n and its action over A n , which maps x = (x±, • • • , x n ) 
to ax = (x CT m, ■ " , x c(n))- Let F£ be a measurable function from A n into 
a measurable space (Y, y). We say that F£ is symmetric whenever for any 
a G n , F s (ax) = F s (x) for any x G A n . Identify now A n / n with a subset A' 
of A™, since A n has n\ disjoint connected components, the map 

j n ■■ A™ — > A"/„ x n 

x 1 — ► (x, <t) = (j7(x), j£(x)), 

where a is such that ax = x G A', is an homeomorphism. Furthermore, jf 
is a local diffeomorphism. Hence, any symmetric measurable (respectively 
continuous or differentiable) function F£ from A n into R can be identified 
with a measurable (respectively continuous or differentiable) function F n 
from into R with 

F n (r l ) = FZ{r 1 ((s n A )- 1 (ri),v)) 
for any a G n or equivalently with 

F n(r,) = ^Y. F n(r 1 ({sir\r 1 la)). 

(n) 

Conversely, any function F n from I" \ into R gives raise to a symmetric func- 
tion F^ from A n into R by F£(x) = F(s A (ji(x))), with the same regularity 
(measurable, continuous or differentiable). Accordingly, every probability 

measure /j, n on F^ can be viewed as a symmetric (i.e., invariant under the 
action of n ) probability measure n s n on A n and vice- versa. 
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Let li be a probability measure on Ta and consider the disintegration of 
li along the map (rj t— » 77(A)): 

= J2»( B \ ^( A ) = *o p w A ) = «)■ 

n>0 

We denote by /j n the measure /_i(. | 77(A) = n). The measure P (77(A) = n)n n 
is the so-called Janossy measure of order n (see |DV.T03| ). We say that li is 
regular whenever for any n > 1, /j*, the symmetric measure associated to 
fjL n , is absolutely continuous with respect to the Lebesgue measure on A™. 

Remark 4.1. Since X is Polish it can embedded as aG$ in a compact metric 
space X' . If a probability measure v on Tx is such that v(uj(X) < +00) = 
1, we can embed (Tx,B(Tx), v) into (Tx',B(Tx')i V X'), with suppi^-' = 
suppf and ux'{uj(X) < +00) = 1. Thus, all the previous results established 
on Ta are valid on Tx> hence on Tx- In particular to every probability 
measure v n onTx C r^, we can associate, as above, a symmetric probability 
measure, on X n . 

The next theorem follows from the previous considerations. 

Theorem 4.1. Assume that li is a regular probability measure on T\ and 
let F be measurable from T\ into R. Then, F is Li-a.s. dijferentiable, on its 
domain, if and only if F£ is LL^-a.s. dijferentiable, on its domain, for any 
integer n. 

The euclidean symmetric cost on X n , denoted by c s n , is defined as: 
c n( x , V) = inf \\\x - cn/H 2 . 

<x€n Z 

It is immediate that 

(3) <(x,y) = cK0T(x)),<0T(y))) 
and that 

(4) c(t7, uj) = c s n {x, y) 

for any x 6 (s^ jf )~ 1 ({ r ?}) an d any y E (s\ o 3i)~ 1 {{oj}). 

Lemma 4.1. A function F£ from A n into R is c s n -concave if and only if 
F£ — \\x\\ 2 /2 is concave in the usual sense and F£ is symmetric. 

Proof. By its very definition, a c^-concave function F£ is of the form: 

(5) F*(a?) = inf«(x, yi) + a;) = inf (-||cc - ayi\\ 2 + a*), 

l£l l£l Z 

where yi belongs to A n for any i £ I. This clearly implies that F£ is symmet- 
ric and euclidean-concave and euclidean-concavity is known to be equivalent 
to the concavity of (x 1— > F£(x) — ||x|| 2 /2) in the usual sense (see [Vil03j). 
hence the result. 

It only remains to prove that symmetric and euclidean-concave can be 
written as in Since F£ is euclidean concave, 

K( x ) = ini T (h\ x ~ Vif + a 0> 

iGl Z 
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for some index set I, (aj, i £ I) a family of real numbers and (yi, i E I) some 
elements of X n . Since F£ is symmetric, F%(x) = F£(crx) = inf crgn F^(a(x)) 
thus 

F£(x) = w£(-\\(tx - yi\\ 2 + en) = inf (-||x - ayi\\ 2 + o»). 

iG/ Z iG/ 2 

crGn crG„ 

The proof is thus complete. □ 

It follows from the Lebesgue-a.s. differentiability of concave function that 
we have: 

Corollary 4.1. Let n > 1 and F£ be a c s n -concave function. Then, F£ is 
Lebesgue-a.s. differentiable on its domain. 

Corollary 4.2. Let n > 1, pb s n an absolutely continuous measure on A n and 
F£ a c s n -concave function. Then, d c s n F s is p s n -a.s. single-valued. 

Proof. We already know (see Corollary 14. 1 1) that F£ is Lebesgue-a.s. differ- 
entiable. From Q, it is clear that 

d<FS(x) = d c »F n (sl(j?(x))). 

The previous theorem implies that the rightmost set is reduced to a singleton 
for /i* -almost-all x, hence the result. □ 

Remind now that for two configurations r] + £ and uj at finite c distance, 
P v +c,ui is one measure on TxxX which realizes this distance. 

Definition 2. For any rj € Tx, for any A C X , tt a (i]) = 7] P A. For any 
map t from X to X , we associate the map t r from Tx to itself, defined by 

t r (}2 £ x) = ^£t(x) for any r] = J2 £ x 

For any probability measure [i on Tx, TT^fi is the image measure of [i by ir A . 
For any r\ = (r]i, 772) € Tx X Tx, we set Pi(r/) = r\i for i = 1, 2. Accordingly, 
for any probability measure 7 on Tx x Tx, Vil the image 0/7 by pi. We also 
introduce ixf := 7r A °Pi, thus ^{f], uj) is the restriction to A off]. For any 
configuration [3 on X x X , define r A by: 

r A : TxxX — > TxxX 

(3 \ — > r A f3 = /3n(Ax X). 

At last, rf- denotes ft o r A . 

The main result of this section is the following. 

Theorem 4.2. Let fx (resp. v) be a probability measure on T\ (resp. Tx) 
satisfying Hypothesis and ( € Ta=- Assume that /j is regular and that 
T Cc (/j,, v) is finite. Then, for any optimal measure p, there exists a c^-concave 
function F such that suppp C d c <F and for any uj € d c <F(rj), 

rA (A?+C,w) = y^, £ (x,x~V^F(y))^ 
x£r) 

for any /3 r?+ ^ ja; realizing c(i] + £, uj). 
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Proof. Ta and Tx are Polish spaces hence there exists at least an optimal 
measure p and a c^-concave function F such that supp/3 C d°<F. By the 
definition of c^-concavity, for any rj G Va, 

F(rf) = M(c(r) + C, Ui) + a;) 

i€i 

= inf inf (c(r/, Wi) + c(C, vaf) + a*) 

Since c(£, rof) + ai does not depend on 77, is -concave for any integer n. 
Then Corollary 14. II implies that F£ is Lebesgue-a.s. differentiable, which in 
turn entails that is p^-a.s. differentiable, since is absolutely continuous 
with respect to the Lebesgue measure. Thus, according to Corollary 14. II and 
Theorem 14. 1\ F has p-a.s. directional derivatives for any v G Vq(A). Let 
v G Vo(A), any cj G d C( -F{ri) must satisfy 

F((V t Tr?) " Ffa) < c((Vn*r? + C, - c(r? + C, w), 
for any t G R and c(j] + (, u) < +00. For any v +C,u> realizing c(rj + £, w), 

c((V?r»7 + C,w)<i / HV^^-ylpd^+c, 

^ JAxX 

z JA c xX 

Hence, 

- F(n) <\l (||V?(x) - y|| 2 - ||x - y|| 2 ) d/Vc- 

z iAxI 

Divide the two terms of this inequality by t > and let t go to 0, we get 
T, F (( V tTv)\t=o < [ (x-y).v(x)d{3 v+c ,„(x,y)- 

at JAxX 
Applying this inequality to —v, we deduce that for any v G Vb(A), 

V T v F(rj) = f (x-y).v(x)dl3 nHtU (x,yy 
JAxX 

We infer from this relation that for any ui G d C( -F(rj), 
rA {PvH,^) = y^ £ (x,id-yrFfa))' 

for any Pn+c,u realizing c(rj + £, cj). □ 

The last theorem means that only a part of any element a; of d c ^F(rj) is 
uniquely determined, namely the part which will be married to rj in an opti- 
mal coupling between u and rj+C- Nonetheless, when £ = 0, this means that 
d c ">{ri) is reduced to one point which is (Id — V r F) r (r/) = Y^xen e x _- v r F ^ . 

Corollary 4.3. Assume that p s n and are two absolutely continuous, sym- 
metric, probability measures on A n and that T c s n (p s n , v^) is finite. Then there 
exists a unique optimal measure p n for MKP(p s nl u^, c s n ) and there exists a 
unique map t s n such that p n = (Id (g>t*)*/i* . 
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Proof. View A n as a subset of the Polish space A™. Since A n \A n has a null 
Lebesgue measure, we can then view p s n and as absolutely continuous, 
symmetric, probability measures on A™. Since A™ is Polish, there exists at 
least one optimal measure for MKP(yU* , z^, c*). For any optimal measure p, 
there exists a c^-concave function f n such that suppp C d c s f n . According 
to Corollary I4.2( d c s n f n is /i^-a.s. single- valued, hence the uniqueness of p n 
and t n follows from Theorem 12,11 □ 

We can then state: 

Theorem 4.3. Let p be a regular probability measure on Ta and v be a 
probability measure on Tx- The Monge-Kantorovitch distance, associated to 
c, between p and v is finite if and only if the following two conditions hold 

(a) p(r](A) = n) = v(uj{X) = n) for any integer n > 0, 

( b ) J2 n >i TciPn, v n ) 2 p(ri(A-) = n) is finite. 
Moreover, we have 

(6) %(p, uf = ^2 %(^n, v n ) 2 p{r)(A) = n), 

n>l 

and there exists a unique c-concave map F such that the unique optimal 
measure p is given by 

p= (Id ® (Id-V r F) r )>, 

where 

(Id-V r F) r (r / )=^e x _ v r F(r)) . 

Proof. If T c (p, v) is finite then according to Theorem condition is 
satisfied. Thus, we have 

T c (p, vf = inf V / c(r/, u) d-y(r], u) 

7GS{m ' V) n>l Jv(A)=u,(X)=n 

= i] ? f / c{r], uj)d(-f\r](A) =n)(rj, uj) p{r](A) = n) 

= i nf , / c( ^> w ) d 7n(f?, w) = n) 



^T c , «, /x(r?(A)=n) 

n>l 

y~]T c ((in, v n f /i(??(A) = n), 



n>l 



where /i® (resp. f*) is the symmetric measure on A™ corresponding to p n 
(resp. p s n ). Let p an optimal measure whose existence is guaranteed because 
Ta and Tx are Polish, we infer from Theorem 14.21 that there exists a c - 
concave function F whose c^-super-gradient is p-a.s. single valued such that 
suppp C d c ®F. According to Theorem 12. 1| this implies that p and T are 
unique and that p = (Id <g> T)*p. At last, Theorem 14.21 entails that T = 
(Id-V r F) r . 
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In the converse direction, since \x is regular and %(p n , u n ) = v^) 
is finite for any n > 1, there exists, for any n > 1, according to Corollary 
14.. S| a measure such that 



r c«(/4> <) =/ <(x, y)dp n (a;, y). 

Now, we set 

P (A) = Y,pn( A n (rf x r x )) M (r ? (A) = «). 

n>l 

Since //(77(A) = n) = i/(ry(A) = n) and since belongs to £(// n , ^n), it is 
clear that p belongs to E(/x, 1/). Moreover, we have: 



/ c(r/, w) dp(r?, w) = V M??( A ) = n) / c(r/, w) dp n (r?, w) 

= Y KvW = n) c s n {x, y) dp* (x, y) 

n>l jAn * Xn 

(7) =5>fa(A)=n)r c .0x',i^ 



sn2 

n>l 



and the last quantity is finite according to the hypothesis. Thus, r c (/j, v) is 
finite. It remains to prove that p constructed above is optimal. For, remind 
that, as mentioned in the preliminaries, it is sufficient that suppp be c- 
cyclically monotone. We infer from the finiteness of f c dp that for any (rj, to) 
in suppp, 77(A) = u)(X), For m any integer, let f(r/j, u>i), i = 1, ■ ■■ ,m) be 
a family of elements of supp p. Set I n = {i € 1, • • • , m, 771(A) = n}, we can 
then write 

m +00 

J^cfa, Wi) = y Y 

1=1 n=l 

Let <T G m , if for some 77., din differs from I n then ^2ii^i n ^(jlit ^i) ^ infinite 
and it is clear that 

m m 

i=i 1=1 

Thus, we now assume that for any n > 1, al n = I n , i.e., ct>i(A) = u CT (j)(A) 
for any i = 1, • • • , m. Since for any n>l,p s n is c„-cyclically monotone, so 
does p n . Moreover, suppp D I n = p n , thus for any n > 1, 

c(l/j, w*) < ^ c(% w CT(i) ). 

By summation, we infer that YaLi c (Vi> u i) — S2=i el 7 ?*' w cr(j)) for any cr € 
m . This amounts to say that suppp is c-cyclically monotone, hence that p 
is an optimal measure (unique according to the first part of the proof) for 
MKP(/j, u, c). We deduce from © that © holds true. □ 
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4.1. Example : Wasserstein distance with respect to a Poisson pro- 
cess. Let a be a diffuse (by which we mean absolutely continuous with re- 
spect to the Lebesgue measure) Radon measure on X, the Poisson measure 
on Tx with intensity a, denoted by /j, a , is the unique probability measure on 
(Tx,B(T x )) such that 

(8) £[exp( J fdrj)] = exp (7 - 1) da(x)\ , 

for all / € Co- It is well known that \x a satisfies Hypothesis[l]and \i a is regular 
since fj, n = a® n , thus the previous results apply. Let a% and a 2 two diffuse 
probability measures on X with finite Wasserstein distance with respect to 
the euclidean cost on X = R fc : 

T e (ai, a 2 ) 2 = inf -/ \\x - y\\ 2 d-y(x, y) < +00. 

7e£(o-i,CT 2 ) ^ JXxX 

We denote by t the transport map from o\ to a 2 and <j> its potential, i.e., the 
convex map from X — > R such that V</> = t. By V, we mean here the usual 
gradient in X. For W and Y two spaces and / : W — > R and (7 : 1" — > R, we 
denote by / © g the map defined on W x Y by (/ g)(x, y) = f(x) + g{y). 

Lemma 4.2. T/ie map t® (n) : (xi, . . . ,x n ) £ X n 1— > (i(a?i), . . . is tte 

transport map from af n to af n . Moreover, 

T e (af n , af n ) 2 = n%{a x , a 2 ) 2 . 

Proof. It is immediate that i®( n ) = V(ffi™ =1 </>) and that ®" =1 </> is convex, thus 
^®(n) j g C y C ii ca iiy monotone (with respect to the squared euclidean cost on 
X n ). Moreover, (t^)*(uf n ) = vf n , hence t®^ is the optimal transport 
map between uf n and v® n . Then, 

T e (af n , af n ) 2 = [ ~\\x- t®W{x)\\ 2 daf n (x) 



X' 



n 



3=1 

= n%(ai, a 2 ) 2 . 

The proof is thus complete. □ 
It then follows from Theorem I4.HI that : 

Theorem 4.4. For a\ and a 2 two diffuse probability measures on X , if 
Teip\i 02) < +00 then T c (n ai , fj, a2 ) is finite. If t = V(j) are respectively the 
transport map and its associated potential for MKP(a\, a 2 , c e ) then 

T = E*® (n) l r (») and $ = E(©r=i^) r l r W 5 

n>l X n>l X 

are respectively the transport map and the associated potential for the Monge- 
Kantorovitch problem MKP(fj, cri , /i CT2 , c). Moreover, 

(9) Va 2 ) = %{<Jl, <T 2 )• 



14 



L. DECREUSEFOND 



Remark 4.2. For finite point processes, it is possible to define a cost between 
configurations by 

We would then have 

TcbiVax, ^ 2 ) 2 = (1 - e~ l )T e (ai, a 2 f ■ 
This distance T Ch appears in papers of Barbour et al. [BB92, XiaOOj. 

A Cox process is a doubly-stochastic Poisson process: a is now a random 
variable in the set of diffuse Radon measures on X and conditionally to 
<7, the point process is a Poisson process of intensity a. By conditioning 
with respect to the intensities, the proof given above yields to the following 
theorem. 

Theorem 4.5. // [i and v are two Cox processes of random intensities o\ 
and a i respectively, such that E [T e (ai, o^)] is finite. Then, 

T c (^ ai ,fi a2 ) = E [%(a 1 ,a 2 )] ■ 



5. Locally finite point processes 

We now only assume that \x is the law of a locally finite point process : 
fj,(rj(A) < +oo) = 1 for all compact sets A but fi(r](X) = +oo) = 1. We can 

(n) 

no longer work on the graded space U n >iT x J since it is //-negligible. We are 
in fact reminded the case of the Wiener space. The re is th us no big surprise 
that we can follow closely the beautiful method of |FU04| . 

Lemma 5.1. Let /j, and v be two probability measures on Fx such that 
T c ({i, v) is finite. Let 7 be one optimal measure and A be a compact set of 
X. Consider the disintegration of 7 along the projection tt^ , i.e., 

7(0 = / 7(- w ) = Vte) dfJ,Ac(r] A c), 
Jr A c 

where [i\c is the image measure of [i by 7r Ac . Denote by "f(. \ rj\c) the. regular 
version of the conditional probability 7(. 1 tt^ (rj, u>) = rj\c). Then, fiA c -(i-s., 
7(. I rj\c) is an optimal measure for MKP{p\^{. | r]A c ), V2l{- I ??A c ) ■, c VAC ) • 

Remark 5.1. // we denote by (N, M) a couple of random variables whose 
distribution is 7 and if Na(ti,uj) := N(t]) D A, then the previous lemma 
stands that conditionally to (Aa c = VA c ), the law of (ija^+Na, M) is optimal 
for MKP(j Vac+Na 1 n A c= VA c , 7m I N A c= VA c , c Va ). Note that within this setting, 
since the law of N is \i, it is clear that 

1n\n A c= VA c = Vn\n A c= VA c i-e., pi7(. |^a c ) = M(- \va c )- 



WASSERSTEIN DISTANCE ON CONFIGURATION SPACE 15 

Proof of Lemma \5.1\ According to the definition of an optimal measure, 
J c{l) = f c{rj, u)d-y(j), u) 

c(ir A r] + 7r A rj, u) d"f(rj, u) 

d/U A c(rM. c ) / c(r/ A c + 7r A 7/, u) d~f(r], u | r)A°) 

Jr x xr x 

d/XA=(??A0 / c VAC (r/, w)d(7r A (g)Id)7(77, u | r/ A c) 

r A c Jr A xr x 

= ^ AC (^ A ®Id) 7 (.|r/ Ac )). 

Now, note that (ir A ® Id)7(. | t/a c ) has marginals ?r A /i(- [ ??a c ) and P27G | ??A C ) 
which are probability measures on T A and Tx respectively. Let M.\ (T A x Fx) 
be the set of probability measures on T\ x Tx- Define the sets B and C as 

B = {( VA c,8): € £(7rV(- l»7A<0> P27(-hA=)} 

C7 = {(r? Ac , 0) : J c?)ac (0) < J Ci)ac ( 7 (. I ^c))} . 

Let K be the projection on T A c of C. Since B and C are Borel, K is a Souslin 
set, hence // A c-measurable. Thus there exists a measurable map from K 
to A / Ii(r A x Fx) such that (r? A c, 0(?7a c )) belongs to C, for /i A c-almost-all 
7/ A c. Define a measure as: 

9(ry A d/i A c(?7 Ac ) + / (ir A <g> Id)7(. | r? A =) d/i A c 0?A c ). 
x ja: c 

KMA^-K") > then 



Jc(0)= / J Cl7AC ((vr A ®Id) 7 (.hAO)d^(^) 
Jk c 



+ y^«/ C7)AC (0(?/AO)d^A c (^AO 



< / J 'c, AC ((T A ®Id)7(.|»7A«))d/iA''(»7AO 

+ / Jc ^ Id )^(- I »/A<0) d/U A =(??A<=) 

= Jc(7), 

which is a contradiction to the optimality of 7. □ 

Theorem 5.1. Assume that the hypothesis of Lem,m,a \5. 1\ holds and assume 
that fj, is regular. Let A be any compact subset of X . Then, there exists a 
measurable map T\ from Tx to itself such that 

7 ((77, u) : ri({M) =T a (tt a V , tt^)) = 1 

Proof. Fix r]\c £ T\c and define C VAC as the support of 7(. | i]A c )- Consider 
the two sets: 

K Va° = {(.V, ^({Pv+vac, «})) G T A x r x : (r? + VA», w}) 6 C, AC } 

and 

^7 AC ,?j = W e r x , (r/,u;) e i^ AO }. 
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We know from Theorem 14.21 that for //A c -ahnost all 7/a c , K VaC)V is reduced 
to one point for /ja(. 1 77Ac)-almost-all r\. Let 

N = {(v, VA°) e T A x T A c : Card (k V!VAC ) > 1}, 

N is a Souslin set, hence it is universally measurable. Let a be the measure 
defined as the image of /i under the projection r/ i— » (tt a t], -K^rf). We then 
have 



a{N) = dfi\c(r] A c) / ljv(r/, VA c )n( drj \ r]A c ) = 0. 

Hence, /i and 7 almost-surely, is reduced to a single-point and we 

define T\ as the map which sends (77, rj\c) to this point. It is automatically 
measurable by the selection theorem. □ 

Theorem 5.2. Assume that the hypothesis of Lem,m,a \5. 1\ holds and assume 
that \i is regular. Let (A n , n > 1) be an increasing sequence of compact sets 
such that U n >iA n = X . Then, there exists a unique optimal measure 7 and 
a unique map T such that 

7= (Id<g>T)V 

Proof. Let 7 be an optimal measure for MKP(/i, u, c). According to Theo- 
rem we know that 

7-a.s. for all integers n. Let B be a bounded subset of X, we clearly have 

r A " (P Vj J) (B) <u(B) < 00, 

for any f}^ u realizing c(rj, oS). Thus, for 7-almost all (77, w), the family 
i r 2 n ({0ri, w})> n > 1) is tight in Tx (see [Kal83j). Hence, up to the extraction 
of a subsequence, one can assume that ^"({Pn, w}) converges to w. On the 
other hand, 7r An r/ converges to 77 and tt^i] converges to as n goes to infinity. 
Define T by T(rj) = lim n ^ 00 TA n (TT An i], 7r A ' i r/), we clearly have ui = T{rj), 
7-a.s. The conclusion follows by Theorem 12.11 □ 

We didn't manage to find any sufficient condition which would ensure the 
finiteness of the Wasserstein distance between two locally finite point pro- 
cesses. However, we do know that there exists some relevant cases. Consider, 
for instance, we are given a Poisson process of non-finite intensity o\ and a 
map h from X to itself such that J \\h\\ 2 da± is finite. Then, 



%(fi ai , (Id+/ i r )> ffl ) < 



\\x - (Id +h)(x) |[ 2 dr)(o 



2 do~i < 00. 



Note that (Id +h r )*fi ai is a Poisson process of intensity (Id +h)*a\. 
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